OOCQHEIT RSSOHB 



BD 097 415 

AUTHOR 
TITLE 

IHSTITOTION 
SPONS AGENCY 



PUB DATE 
NOTE 

/ 

EDBS PRICE 
DESCRIPTORS 



IDENTIFIERS 



CB 002 112 

Fortune/ Jim C* 

Per£or Dance Test Developnent in Hachine Shop: 
Appendix H, Final Rsport. 

Nassachusetts UniT., Anhecst. Center for occupational 
Education. '^-^ \ 

Nassachusetts State Dept. of Education, Boston. 
Research Coordinatiig Unit for Occupational 
Education. ; New York State Education T>9pt. , Albany. 
New York Research Coordinating Onit* 
Jun' 72 

39p.; For related docuaents see ED 060 218, ED 085 
541, CE 002 111*117 V 

HF-$0. 75 HC-$ 1.85 PIOS POSTAGE 

♦Behavioral Qljjectives; ^Criterion Referenced Tests; 
Educational Objectives; Industrial Arts; *Ites Baikss 
llachinists; ♦Hecbanics (Process); ^Perforaance Baised 
Education; Shop Curriculun; Test Construction); Trade 
and Industrial Education; Vocational Education 
ESCOE; Nassachusetts; New York 



ABSTRACT ' 

This is one of the outcomes of the work of the 
Nassachusetts Evaluation Service Center for Occupational Education 
(ESCOE). The first part of this docunelit is an overview of the , 
Perfornance Test Development Project. The renainder of the docuaent 
explores machine shop curriculum in terms of terminal behavioral 
objectives which were grouped by desired performance.\ Bach 
performance group was synthesized into a single multifaceted 
objective (synthesized objective). Blueprinting was selected as the 
test item to be used ^n the initial field test which is described at 
length in terms of test de icription (general form and administration 
procgdagQg) f fi-eld testing, and revision recommendations. Tables and 
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OVERVIEW OF THi: PERFORMANCE TEST DEVELOPMEtTr PROJECT 
Backgrouiu i 

The 1968 Amendment to the. Federal Vocatlonal*Technical Education Act 
mandated the development of atater^wide evaluation sya terns for the administra* 
tion and operation of federally supported vocational education.. Parallel to 
this mandate the Research Coordinating Uivit director for the Commonweaith of 
Massachusetts was in the process of completing some predesign activities for 
the development of a vocational-technical education management informatiott 
system. By 1969 the predesign of this system had moved into the feasibility 
s^tages and specifications of the system were being develojxied. 

At this stage of development New York State, which already had a fine 
centralized testing program, became interested in the philosophy espoused by 
the Massachusetts system and joined in the funding of a more intense feasi- 
bility test, which eventually became the sourc« of the Performance T^st De- 
velopment Project. The Evaluation Service Center for Occupational Education 
(ESCOE) was funded in July 1971 and was housed in Amherst, Massachusetts, to 
test; the feasibility of systems * development based upon the principles of (1) 

i ] 

local control and development of vocational curricula, (£) data-*based feedback 

i.^ ♦ 

based upoi tailored perfcttmance tests, and (1) curriculum description trhr6ugh 
terminal behavior objectives* The following report deals with a subcomponent 
of the FSCOC system which was designed to develop performance tests as soft- 
ware support for the ESGOE (Program. 
Whats and Whys of Performance Testing 

Performance testing is'more a new reality as opposed to a new concept in 
educational testing. The concept grows out of the need felt by educators to 



sample actual performances of ti^'alnees as opposed to merely measuring symptoms 
^ of desired (or Intended) competencies through paper and pencil tests and then 
relying upon the predicative powers (I.e., previously established associatlorts 
of paper and pencil test scores to some hypothetical or observed criterion of 
competency In performance) of the test to infer competency 'acquisition,. This 
felt need has grown in part from the inability of standardized achievement 
tests to deal with the unique objectives of a specific educntfonal program, 
in pare from the reportedly low correlations between measured skills and on- 
\the-job (or in^the-shop) performance^; and in part from the lack of realism 
Involved in the paper and pencil testing situation. 

Hence the performance test cai>-be conceived of as a crijterlon-referenced 
test, in that (1) it is objective or criteria-centered (in one-to-one corres- 
pondence with the extant component of a stated objective); (2) it seeks to 
ascertain a subject's possession of a specific competency rather than to com- 
plete a comparison of the subject's competency level to a previously measured 
norm group; and (3) it usually requires a dichotomous iiecislon as to whether 
the competendy has been demonstrated. The performance t^st can be construed 
to be a special case of the criterion-referenced test in that there is a Ae%- 
inite attempt to establish fidelity between the sample observation of the per- 
formance test and the performance being sampled. 

In the evaluation of instructional programs in vocational-technical edu- 
cation, the concept of performance^, testing is especially appropriate for sev- 
eral different reason^. First, performance tests can be hypothesized to pro- 
duce more relevant and vall^ date concerning the instructional program output. 
Vocational program objectives tend to deal with competencies which require 
concurrent behavior changes across several domains of instructional objectives. 
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Henci* the accompl Isluneiit pf a vocational objective may depend upon the deviU^ 
opment of a psychomotor sl^lll, the mastery of a cognate process, the acquiring 
of some fundamental facts, and the development of a"" particular attitude. Un* 
like paper and pencil te&ts, which emphasize the measuf^ment of the cognitive 
aspects of the performance or observations whlth emphasize process and action 
components, performance tests possess the potential to measure the mixture of 
behavior domains appearing in the desired performance. The performance test 
can tnerefore be argued to offer a valid means of measuring intended outcomes. 

Second, performance tests produce product records which can be^studied by 
teachers to diagnose the place in the instruction where a wealtf^ess may have 
occurred, hiding considerably their ability to analyze their instructional 
methods. Since the teacher can determine what aspects of the competency are 
missing, he can trace the ppin,t in his instruction where his objectives were 
not met. Also, since the prc^duct is concrete it can be kept longitudinally 
to analyze pupil growth at different stages of a multi-^year program. 

Third, the nature of the data produced by performance testing contains . 
the flexibility demanded by the information needs of an evaluation system. 
The tests are constructed in one-to-one correspondence to stated objectives, 
thus enabling selection of test components from a data bank situation in such 

a manner as to. tailor the testing to the Measurement of a unique set of pro** 

V 

gram objectives. Since the tests are obj-ective specific, comparisons of small 
aspects of an instructional program are possible.; Since the tests are criter- 
ion-*ref erenced, skill attainment in a particular area of interest can be as-* 
certained; hence output of instructional programs can be described relative 
to percentage of skill development. 
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Rest rn tnts o n Tes t Dtyvr lo|)ment 

Tho design of tiff |)C'rfonnanct» tests had to take into account both thi; 
phllsophlcal and the operational structure of ESCOE. At times both of these 
structures served as restraining and occasionally frustrating hurdles for the 
test development team. * 

The philosophical nature of ESCOE provided tljb foundation of principles 
which are believed to have caused the performance tests to be unique. Since 
the objectives were generated by each local school, several very similar ob- 
jectives appeared for a single behavior within a subject. Dr. David Berliner, 
now with the Far West Laboratory for Educational Research and Development, 
Invented a process to state these similar objectives into a synthesized forni 
/ accompanied by item changes providing for the unique characteristics of each 
objective. Thus, if enough objectives from different schools were collected 
to represent the curricula, by synthesizing those objectives one could arrive 
at a statement of all desirable behaviors within one curriculum. 

The raw objectives based upon the curricula of each of the participating 
schools were synthesized to identify the major behaviors within a curriculum 
area. ,i*ence, if the process worked ideally within a curriculum area a linear 
set of behaviors was produced. The degree to which this process failed to 
pr&duce such a linear array of behaviors compriWd the first major restraint. 
If a singular listing of behaviors could not be gained, then singular test 
; .Al6^s coiild not be written. ^ 

A teecpnd philosophical principle which developed into a restraining fac- 
tor was the decision to test only locally-maintained objectives within a 
specific program. This principle actually involved several implications for 
testing. First, a student would be tested only on the objectives maintained 



by the curriculum he waH receiving. Ther<^fore» the test J Lews had lu bo de- 
scribed In a form Indicating one-to-one correspoi^ehce with the synthesized 
objectives so that the local teacher could select only those items maintained 
for his course. This selection pattern, however, did increase the logical 

assumption that the tests possessed high validity In regard to the courses 

Of 

for which they were_ designed to measure outcomes. Second, each item had to 
be independent in its ability to be administered, since previous; or adjoining 
items would not necessarily be administered with it. This item independence^. 

served as a restraint to test development in that objectives could not "be 

' t 

clustered into tasks involving several test items. : 

t 
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The third restraint involved both philosophical and operational asp|*ct^a 

in that two forms of scoring were preferred by the two cooperating statef. 
Phild^ophically, the state coordinators differed on the location of scoring; 
this disagreement became a restraint to test development in that the items 
developed had to be scorable both in the local school and at a central test 
center. Three forms of scoring mepting this restraint were adopted, with 
choice of scoring form depending upon the nature of the individual item. 
Two of the^ forms are based upon meeting the restraint with a single scoring 
process. The third form requires two different processes in-order to meet 
the dual scoring restraint. 

The scoring approaches requiring only one process are (1) the caliper or 
mechanically scored form and (2) the selection of correct response form. In 
the mechanically scored approach, severa.^ measured settings can be placed in 
a test scoring kit; the student or teacher records by label which of the set- 
tings fits the final product. A key of correct setting labels can then be 
referred to, producing a dichotomous score for the product in terms of size 



tolerancrs. In the selection of correct response approach, .correct Itm keys - 
can be applied directly to the students* responses. In both cases either a 
central office or an Individual classroom teacher can use the keys. 

The third scoring form is not as simple, since two types of scores are 
required tb meet the dual-use restraint. This scoring form is necessitated 
by the many tasks in the vocational curriculum which require expert observer 
judgment for the de term inat ion* of performance quality. The two types of 
scoring needed for these items are (1) structured criteria for observation^ 
and (2) pictorial records (color-coded to facilitate central scoring). The 
structured criteria for observation communicate to the teacher what aspects 
of the product to check in order to judge the performance successful. Theae 
criteria would be used in class, in the pictorial scoring process, camera 

\ 

anglus have been described which would ^llow Polaroid pictures to be taken^ 

\ 

\ 

as records of the finished product. Color-coding the criteria chcck^ w^uld 
enable observers in a central location to determine the quality of the pw- 
formance. 

Each of these three approaches provides a means through which creklible 
and unbiased Scores can be obtained. All of the processes can be scored by 
Individual teaijihers and used within a classroom setting without the aid of a 
central scoring station. The fourth restraint to test development arises at 
this point, since it is impossible to arrive at an immediately usable set of 
norms through the current scoring system and the dichotcmous item response 
without implementation of a program designed to gather enough data to norm 
the tests. 

Two other restraints wer^ present throughout the test development proj- 
ect, both operational in nature. First was the quality and quantity of the 
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hiBhavioral objectives themselves^ Few if any of the curriculum areas were 
fully described, and the tests/developed are limited to described curriculum. 
In two test areas t more it^ were <jleveloped and the synthesization process 
was repeated in order to sharpen the synth68i2e4 objectives. In these cases, 
much curriculum had been left undescribed and the fill-in process aided con- 
siderably in explaining the descriptions. However, complete and raultiple 
sets of items were not available from each school; .therefore the test items 
may be -lacking in content validity; in cases of consultant-written items, may 
be representative of several behaviors, and may hence be difficult to test 
or represent only a small segment of the previously unwr listen curriculum. 

The second operational restraint was that of time. Although the budget 
waa small, the seriously close deadlines In develo|)ment work made time an 
even greater restraint. Creativity is sometimes especially evasive under 
deadlines and'wlthin the constraints of administrative conflict. Still, the 
time dimensions were met in terms of design. Since Schools were closed dur- 
ing the crif£2cal month of June, illustrations of some items of the tests 
could n^t be produced; therefore only plans, item descriptions, materials 
descriptions and administration Instructions could be developed. 

A final restraint can be observed in the language in which the proposal 
was written. First, several terms appar(ently changed in meaning or in rele- 
vance to the project once development began. One apparent change occurred 
in the description of sixteen tests for four areas. One test for each level 
of a curriculum area cannot be developed so as to be equally relevant t'O all 
schools. Since the schools maintain different objectives, different Items 
must be assigned to each school, even on the same level. Hence a more ap- 
propriate process becomes the development of an item bank from which tailored 
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tests can be developed for each individual program. Second, the time re- 
straints and the differenceaOn nattfr^ of curriculum required different kinds 
oftryouts, making the. languaga of the proposal seem sometimes inappropriate. 
Purposes of the Test Developm ent Prelect 

The design of the test development project included not onlj the goal of 
producing tests as products but also the goal of establishing feasibility of 
the test development effort across a broad spectrum of vocational-occupational 
curricula. For this reason four different areas of vocational curricula were 
selected for test development. These four areas (differed In hypothetical 
difficulty of test development. The areas chosen were machine shop, wood- 
working and carpentry, electronics, and automobilei mechanics. The automobile 
mechanics area was hypothesized ,to be the most difficult since manuf acturer| 
determined the curriculum, which therefore differed across Competing fflahufap 
turers. 

The performance tests were hypothesized to be' sufficiently flexible t<^ 
fAilfill many purposes of a comprehensive evaluation system. Because of their 
proximity to the desired outcomes, performance tests were hypothesized to I 
serve as (1) student diagnostic and prerequisite instruments. (2) diagnostic 
instruments for the analysis of Instruction. (3) criterion instruments. (A) 
measured of classroom achievement, and (5) program success indicators. Each 
of these uses has already been piloted to some extent. ^ 

The performance tests as developed have several a>plicafion conveniences. 
First, since the test Itans are paralleled to synthesized objectives, computer 
selection of test items or "synob" comparison of items can be uked as a meth- 
odology for tailoring tests to instruction. Second, since the conceptual 

1 

frames of the tests can be described, each test has built-in potential up- 



dating or extcnBion by the classroom teacher. . ^ 

Problans Enc ounter ed 

I 

Problems ojpcurred from three v|ei/polnts. First was the problem of lack 

. A. - . i - 

of known direci^icm, a handicap vrhich often occurs in the area of development* 
Second was the problem of lack of perfection or completion of the objectives 
used as raw materials for the development of test Items* Third was the prob* 

of contending * with dual, scoring requirements and with several different 
kinds of program emphasis Vnd structure. 

The f irst probl^;has been emphasiv^ed recently with the development worlT" 
done on criterion^ef erenced testing. Fi^om a conceptual point of y lew » the 
criteria previously used to determine the qiiality of norm-refer edced tests 
can no longer be used for criterion-referenced tests. Since the measurement 
strategy of the criterion-referenced test and the performance test is to de- 
termine the {Possession of either a skill or the capability to carry out an 
activity or process > the degree to which the test differentiates between 
subjects taking the tests does nothing to Indicate test quality* Unlike the 
norm-referenced test> in whi^ measuraaent strategy is to distinguish between 
subjects, the performance testVcannot be hypothesized to produce large dif-\ 
ferences across subjectj nor can any specific level of difficulty be expected. 
Hence, average levels of difficulty and large differences between subjects do 
not indicate quality of the performance test. 

In performance testing, some concepts of reliability still appear useful, 
while others appear to have lo$t their relevance. Reliability over time, or 
testrretest reliability, is still meaningful as long as the time between tests 
did not include opportunity for the subject to acquire the skill in question. 
Since performance tests are designed so that each item does not necessarily 

/ 
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refer to the same skill or activity,' reliability Indices dealing with homoge- 
neity o: the test no longer ^pear to be relevant criteria for test quality. 

The dfiSgJ^eo tp which the itcans of a perfomance test cover the skills of 
an area and approktniate actual required performances operates in a 8imil*ir 
relationship to the performance teat as that of a prediction Index to a norm- 
referenced test. This, degree of .similarity might be compared to the concept 
of fidelity so often used in the recording industry. 

The second problem area involved the quality of the raw maiterials used. 
As should be expected, the synthesis process does not apply evenly to all 
areas and was not applied with th<3 same consistency to each set of objectives. 
In t'.ie machine shop curriculum area, between- 70 and ' 80 'percent bf the content 
was described by the objectives. These objectives possessed adequate depth 
across skill areas to enable the synthesis process to produce clear synthe- 
sized objectives describii^ unique performances. The creation of Items 
parallel to the synthesized objectives and possessing the independence and 
flexibility required by the philosophy of the system, was a straightforward 
process < 

In the woodworking area, between 60 and 75 percent of the content was 
described. Unfortunately, the syr thesizers of the raw objectives failed to 
produce synthesizsid objectives which dealt only with single performances. 
Instead, the raw objectives were synthesized by similar or related behaviors 
and the product of this process was a matrix of similar perfdrmances (rather 
than a single performance) wijth several form changes denoting differences In 
conditions and extents across schools. Since these products seemed usable, 
the decision was made to produce a matrix of test items geneiat6d in one-to- 
^ne correspondence to the performances included in each synthesized objective. 



This uecislon was the source of some tiine lost due to the expanded number of 
test Items which had to be written; however, this Increase in items was ac- 
compani«d by a large increase in iest specificity, which increases the degree 
to which the performance test can be tailored to fit a given instructional 
program' without any noticeable loss of efficiency of the item banking process. 

Due to the variance of material and the limited scope of the objectives 
developed for the electronics curriculum ai^ea, a decision was made to rewrite 
many of > the synthesized objectives. For more than one-half of the contract 
period two of the test development team members struggled to find a format 
within which the scope of the electronics curriculum could be described. By 
expanding the number conditions it was found that classes of performance 
could be described by synthesized objectives. Hence, through considerable 
redesign and a sm'all set of comproiiis^s of the synthesis process involving 
uniqueness of performances and allowance of performance iom changes, sub- 
collections of electronics objectives could be written which would allow 
test development along similar conceptual lines as those followed in the 
development of the machine shop test. Results of the test development ef- 
fort again produced item banks, as in the two previous test areas, with ^he 
items possessing similar relationships to the synthesized objectives,. 

In the area of automobile mechanics, less th^ 50 percent of the content 
was described by the raw objectives. Many of thf subdivisions of content were 
too sparse 'to allow for the development of synthesized objectives. In addi:^ 
tion, the synthesis process applied seemed irregular across blocks and units. 
The 1.2vel of abstraction of behaviors described by the raw objectives and the 
interdependence of the performances raise questions concerning the appropri- 
ateness of the synthesis process in this area. Certainly, the limited number 



of usable synthesized objectives and^ the necessary revisions of the existing 
objectives made the decision to rewrite the objectives essential. Revision 
of the curriculum descriptions wera made in relaitionship to the job orienta- 
tion of the cuirriculum. Test items were written around standard mechanics 
tasks as described in the automobile mechanics curriculum. In some of these 
Items, synthesized objectives are tested in a format which includes a cluster 
of the objectives provided by the ESCOE system. In other items, onlj^ parts 
of ESCOE-produced objectives are included in the new synthesized objectives 
t»eing: tested. Once a test item has been constructed, the process can be re- 
versed so that system capability as achieved in the other three test areas 
can be gained. Because of their time-consuming nature, tasks in the curricu- 
lum such as disassembly or reassembly of motor or transmission were not in- 
* eluded as complete test items. Instead, either sample tasks extracted from 
the Itrge unmanageable task or written or pictorial selection items were 
created to test these phases of the curriculum. 

The third problem area encountered was the difficulty involved in the 
existence of two separate scoring requirements and in the time limitations 
of the test development project. It was not always possible to produce useful 
In-class scoring of the performance Item and credible, objective centralized 
scoring of the performance through application of the same scoring process. 
Therefore some items are suspected to produce more useful scores in the class- 
room than in i central scoring situation, while the reverse is suspected of 
other itans.} Only time and study of the tests can alter or affirm these sus- 
picions. It is unfortunate that systematic refinement of the woodworking, 
electronics, and automobile mechanics tests is not planned to occur along 
the same lines as those applied to the machine shop test. 
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rha -oUovlng .eporc includes develo^en,j and iUU testing ptoceaures 
banR descriptions, reco-ended analysis procedures and uses for one of 
four test areas briefly described above. 



\ 



\ 



\ 



Introduction 



The- mac;.ine shop test has otten been referred to as the Booth test, in 
reference to Russell Booth, the original developer of the basic test format. 
The test is the pioneer performance test upon whi-^h many of the basic ideas 
of the ESCOE system were initially implemented. It was the first test devel- 
oped because of the completeness of the objectives obtained for the machine 
shop curriculum area in the Massachusetts pilot test. This completeness of 
description by objectives was not replicated in the remainder of . the ESCOE 
project, perhaps indicating some uniqueness of the curriculum area itself. 

Work with the objectives in this area indicates that the skill aspects 
of thQ curriculum are easily adapted to the behavioral objective format. 
Performances are straightforward and can be described in terms of relatively 
independent activities. .The performance of the activities requested by the 
objectives takes place in a somewhat standardized environment with a 
set of altornative behaviors. Thus each synthesized objective states a fixed 
'desired i,erlormance which is to occur in a relatively standardized environ- 



ment . 



Conceptual Scheme for Development 

Once each LEA had described its machine shop curriculum in terms of ter- 
,.inal behavioral objectives, the objectives were grouped by desired perfor- 
mance. Each performance grouping was then synthesized into a jingle mult i- 
faceted objective called a "synthesized" objective. The synthesized objective 
stated the unique performance and denoted the uniqueness of each LEA through 
the inclusion of form changes, which were recorded tor both, condition and 



extent. Ihis synthtvs izod objective was used (or tho eonceptualizut Ion ol the 
machine shop test, and a test Item was created In one-to-one correspondence 
to each synthesized objective. 

The process through which the test item was ultimately formulated was 
one of communication between a psychometrician and a consultant machine shop 
teacher. Beginning ^^th the basic synthesized objective, the vocational 
educator offered a verbal description of the performance demanded. Expected 
time required to compylete the performance and uniqueness of the required per- 
formance were then dismissed. FolltJwing this verbalization the psychometri- 
cian converted the performance description into a t^st event, which the vo- 
cational educator translated into a form which could be used to communicate 
the task to students. In this case the form was blueprinting. After the 
test item had been agrtod upon, criteria fot successful performance were 
discussed and a scoring scheme was devised. This process was practiced for 
each synthesized objective until the i-.urriculum was completed. ^ 

In the pilot run and in the basic fir^t test developed, only about one- 
third of the curriculum was tested. Field tests of this pioneer unit showed 
promise and suggested some focus changes as ^ell as the addition of detailed 
test administration directions. The past year's work expanded this test to 
include approximately 75 percent of the curriculum dnd established a field 
administration trial of the total test. 

! 

\ - ■ 

\^ Test Description 

\ 

Gene ra 1 form . In addition to expansion of the original machine shop 
test, several teacher administration options were created. Figure 1 shows 



^ the item selectioh form, which a4.1ows the teacher to designate which it^s 
are m be taken by selected groups of students. Thus the teacher can elect 
to test .^students upon only a few of \the items or upon all of the items, can 
elect to test the whole class or onli part of the class, and can elect for 
the students to pursue the same testlhg program or for ieach student to com- 



plete a different selection of items. 



The item selection, form- provides a 



vehicle through which the teacher can report his testing Intentions if the - 
system makes this option available. 

The total test is designed Into operations performed on two pieces which 
can eventually be put together. Piece One covers 19 operations and Piece. Two 
includes 16 operations, as illustrated on the form in Figure 1. Using stan- 
dard- grading practices, an additional* \1 supervisory and grading operations 
can be built into the testing process./ Therefore, 52 terminal oblectives 

' can be measured by the perfcfTmance t^^t; 35 of these performances result in 

* ■ , ' ' ' ' 

products which can be carried acroWVeaifB or levels of a student's program, 
yielding visual product recoOT'of his gfowth. The 17 remaining perfor- 
mances, most of which fall toward the completion segment of tlie curriculum, 
can be kept in written record form.% Figure 2 illustrates a potential record 
form for recording students' completion of^gradlt^ itans. 

The total test was conceived' to require eight to twelve hWs for com- 
pletion, and only the shop tools normally present in any machine shop in- 
structional, setting ar^ needed for the administration of the test. In fact, 
it la felt, that greater instructional validity is gained when the students 
ate tested on the same shop equipment on which they were instructed. Heitce 
no change of equipment or additional equipment is needed to conduct the test. 

Administr ation procedures . As previously stressed, the machine shop 
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Layout various '^steps'* 

Straight turn all diaaeters 

Perform necking opereelone 

Cut chanfers 

Cm thread to be "chabed" (tool bit) 
Reverse piece 

9 

Shoulder turn 

Taptfl^ turn 

Cut thread with die 

Drill and cap hole 

Cut^l^oodruff kovBtiat 

Inspection 

Repair Center holes 

Cylindrical grind (tSperJ 

Inspection (taper and ftnleh) 
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PIECE TWO OPERATIONS 
'Cut ofi stock (potter saw) 
Plain ro^ll (qc end nlU) to thickness 
Pos it ioil stock In chMck of lathe 
Drill, bore £nd cut thread 
Inspection for thread 
Position s^ock in Vertical n.o. 
Bore out th^ thread 
Cut keyvay \ 
Numcrlciil connrol ' 
Press piece onyandrel 
Turn on iind inspect 

MtHint on index tenters *^ 
«>if Kear teeth or cutter teeth and inspect 
Heatt treat And inspect 
:utter grinder and inspect (if necsAtary) 
Surface grind both Aidet 
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test requires only the normal machinery or equipment used in instruction. 
The only additional materials needed are (1) the pieces of stock, (2) the 
instructions for administration^ (3) the blueprints describing the test 
items, and (4) the optional grading kit. 

Figure 3 shows the administration instrucfciops used during the field 
tests. Obviously, these instructions are highly dependent upon plans for 
data use and upon the systeraVs requirements. Different Instructions will 

4 

be necessary if all students are to take the same test. Figure 4 Records 

the instructions given to the ^tddents. Again, these in8tructions| depend 

upon system requirements for much pf their content. A)r instance,; if the 

test were to be centrally «''ored or graded, the siudents would be^ instrucr 

ted to use the grading kit rather than the instruction center prc^cedure. 

■ • ' . . - ■ i ' ' • 

Figures 5 and 6 illustrate the test kit used for the field tests. 

Field Testing 

^ The machine shop test was 'field. tested in three schools over several 
levels of students (Levels 1, 2 and 3 in four-level programs; Levels 1 and 
2 in three- levpl programs) at two different times., fie Id test was 
deigned to CD develop estimates of required testing times, (2) produce 
estimates of . item difficulty, (3) produce some estimates of tast-retest 
reliability, (4) try out administration instructions, (5) try out record- 
ing forms, (6) gain the reactions of shop instructors to the tests, and 
(7) indicate directions for future rev i8ionJ|^ Although the sample size 
was greatly .decreased by poor timing and by a somewhat reactive attitude 
in the field to the fact that the ESCOE system was being terminated, thus 
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' TO THE STUDENTS i 

Yoii have been selected to work on a government-sponsored study \of 
vocational education, specifically machine shop. This is a "field 
test" so that we can modify and improve the test before giving *it to 
more students. 

! 

Your performance on these skills (in the lathe, drill press, and 
bench work areas) will not be counted toward your course work. A 

checklist of performance (included in your test kit) will be kept, 

ft 

as you will be bringing your test piece to the inspection center for 
checking after each step is performed. However, if the inspection 
area is full or the checker is not available, just go on to the niext 
step. 

Your test kit contains either one or two pieces to be worked with, 
a sequence of operations with instructions, a blueprint, and the check- 
list already meationed. 

You are not expected to be able to perform all the operations, and 
no one^s work will be perfect. Do n^t do those operations you have not 
had experience with. 

Are there anjr questions now, before you get your test kit? 

(Pass ou.t kit) ' 

Any further questions? 

Please keep this piece and work on It whenever you have rorapletcd 
any of the items. ^ 

FIGURE 4 
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BEST copy AVWlABLt 

:.EqUOiCE OP (tfERATlONS (PUCE #1) 

Block Untt Dtfctiie iopa 

01 01 :6-l/6*V 



Fioiah 

StAct ! 
ubtrcict Brcttko 

TOTAL I 



Pay II \D4v »2 

i 



C«ntor drill both «nda 
Straight turn 
M«cklng 
Chaolar 

Shoulder turn* 
Taper iurnlog 
Oia-tuc 5/8- IB 
Dj^ll^ (for tap) 
Tap (1/4-20) 

Cut »xi. thread i/4-10** 
Mill keyimy 



Tlw» Sat iafact g ry 
ReqMlrad 



01 0^ )/l6 Ola. 

01 01 1*1/8 

3/16X.55) 
01 10 S/16s.l^20 

1/4S11/16 

01 n 3/64 and S/6^ 

01 OB 1/8x11/16. 

a 

22 .87Sx. 773x2.000 
4.000 



01 
09 
03 
01 
01 



04 
07 
23 



02 07 



13 P.O. .569 



.004 



.2or (17 drill) 



r:0. .217 



♦.000 I 
-.003 i 



P.O. .685 



4.000 I 
<r.006 n 



.250^ 



-.002 



Uneat 



etaqtory 



MOTES (PieCE #1) 



* Piece Has been reveraed bettfeen the ceotera. 

If the freatean ia capable of performing this oporation« it «U1 be the 
aixth operation in nequence. 



) 



iitea of thrcada may be changed to accanodate available taps and dies. 

» 

Oiaenslons for external threads are given for thread aicroaeter seaaursBenta. 
If the LEA teacher wishes to test ths thrss<«iMrs s^rsteB, the students vill 
compute Che aessuraneot . 

At the discretion of the locsl tMcher s straight slot asy be Mubstltufed 
for the Woodruff key alot« 
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BESl COPY RVWUBLEsuntr.c 

SKQUbNCi OF OPEKAUONS (PIBC£ #2) 







Bf6ck 


unlf 


Digiyetone 


I 


PI HI (tor bui 1* •>! i.t*;im) 


02 


.02 


11/12 


2 


borv or r«am .\ 


02 
02 


01 


i.OOO" 


) 


cut kayway 


0? 


' 07 


.250s. 12S 


4 


F«ce otf or mill to thlcknettt 


01 

02 


02 
□3 


.270 




StratKht turn lolvdnlnR cuO 


01 


9} 


1-1/8 




Prtpare prograA (n.C.) 


oa 


01 




t 


Prapara tape (N,l . ) 


08 


02 




0 


Drill and cncrak (N.C. )* 


08 


03 




9 


Cyllndrlcil s^^^^ 


17 


01 


3.750 


10 


Stt up Index haad 


02 


n 


No grror 
gliowed 


n 


Cut Billing cutter teeth 


02 


09 


1/2" deep 


12 


Harden and leapvr 


.11 


02 




ij 


✓ 

Grind clearance angles 


17 


02 




14 


Suffecg grind 


17 


03 


.250 



/ 



Pinigh 
scare 
t Breaks 

TOTAL 



Tine 



aattttfactpr :y 

Onedtiefgtior:^ 



NOTES (PtbCC #2) 

No decision hee been Rgde on ingpection of (he Mtchittg operationg. Thg pogitioQ of 
the holes aey be checked by megguring over iMertgd plugg (pigcge of uilll rod), god 
thie would neceeeltetg gdding a rgtaing opgffgtion to tike procoduro; 

Beceuae of the 1/4" thickoege of the pigce. it say bg oocaggery to produce g backup 
plete. 

Toleranc^a oo dinaneione will be datarvioed by thg LBA to conform to raw objectives. 



Sheet »l will be ue»d if nidMrical control ie tgetod. The hoUa will gubagquaatly 
be silled out of the piece Whan thg gggr or ttiUing cutter ig made on tba •illing 
aiachina. tt ia augggaced Clut tho oilling euttac bg ngdg bacauggt 

1) The 64^ engXg cutcgr ie sore likely to gvgilgblg in thg ehop. 

2) Objectives perteifling to euttgr grind ia| can bg tgeted. 

3) The raeultii^ cutter eay be put to preetieel uea in the ehop. 

4) The reaulta of the heat treating will be «ore iaprenive tiheo the cutter 
la uaed. ^ 

5) Objactivae raletiog to geara can be effectively teatad on irritteo teete. 



FIGURE 6 



BEST COPY AVAItABLL 




r ; . : 

MAY* 



I 

' eoually i. 



Sept. 19/71 



ClMranea unUst othtnAM •MelfleU 
Kruetional JiMntlon 1/64" 




ERLC^ 




NOnti Uavt ,020* on O.D. f6r 
Orlndini* 

V»oth fttf to *300* doop rotfiolly 
ona 0 roko. Doo 60* onglo cutttr 
Tool A Outtor Srlndori 

SMontfi ry on^lo 16^ 
•Idth of Unl 1/16* 

B«ro and roaa holo l.OOO* t'oooo 
fldtH of feoy«oy .{50* 



3.750 







M«lMl Ma 


M«M e«i«r«iM'«p«8int< 






preventing the administration of some posttests In the rellabijjlty study, 
the field test was deemed a success on all seven purposes. 
. Table I rep^-esents the dljstributlon of students cooperating in the 
test. Results from any one group are not reported individually since com-, 
'parlson of schools was not part of the field test. This, table shows only 
participation for the testing and does not include separate item,counts, 
which are reported in later tables giving atatistie^ for individual^ items. 

It should be noted here that not all students took every Item and that not 

I" 

all students participated in the second testing. 

Table II records the average time required for completion across items 
as well as the number of students completing each item. This telle also 

indicates both tnaximum and minimum times to completion by levels of stud- 

. . •■ . . / I ' 

ents across testing sites. It can be seen ^rom Table II that the variance/ 

■ . . / 
of time needed to complete each Item was greater than expected, causinR the 

total range of required time per center Lo vary between eight hours and 16 
hours— eight being the minimum estimated and 16 being four ho^rs longer 
than the maximum estimated. The overage time of 9.8 hours is jUst under 
the predicted ten hours. These time estimates should be sufficiently ac- 
curate to provide the test administrator with completion time estimates for 
any combination of items. However, the field test did indicate a need to 
improve or standardize time-keeping procedures and to provide a better- 
organized form for recording completion time. 

Table III represents the isstimates of item difficulty calculated from 
the field test. Most of the percentages of correct response rangad from 60 
to 80 percent— perhaps indicating that the items are not quite difficult 
enough. This indication is not of serious concern, however, because of the 
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TABLE I 



SCHOOLS PARTICIPATING IN MACHINE SHOP FIELD TEST 



School 

Oiman Regional Vocational- 
Technical High School- 



Level 
9 
10 
11 



No. of Student^ 
6 
6 
4 



Greater Lawren^ 2 Regional 
Vocational High School 



9T 



9RT 



iir 



IIRT 



6 
6 
3 
3 



Nas!=:au County 'BOCES 



1 

3 



2 
2 



Test 



Retest 
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TABLE II 

TESTING TIMES FOR MACHINE SHOP TEST ITEMS 

(A7eraYe~Timi7~ItrMlniites, Tle^^ Items 

with Maximum and Minimum Times Observed at a Test Site) 



It 









Maximum 


Minimum 




Number 


Completion 


Observed 


Observed 


.em 


Taking 


Time 


Site Time 


Site lime 


1 


17 


12.2 


22.8 


3.6 


2 


JQ 




17.5 


3.2 


3 


Jo 




88.3 


' 15,0 


4a 






30.0 


5.3 


4b 


37 




22.0 


5.5 


4c 


37 




20. 3 


5.5 


5 


37 




25.0 


1.0 

* 2.0 


6 


37 




45.6 ' 


7 


35 


28.2 


55.6 


lO • n 


8 


34 


17 7 


" 30.0 


8.8 


9 


33 


ft A 


18.5 


2.8 


10 


33 


ft Q 


17.5 


2.0 


11 


29 


29.9 


72.3 


6.0 


12 


24 


13.1 


22.5 


4.0 


13 


3 


10.0 


ao.j)„ 


10.0 


1 


10 


20.8 


27.0 


14.5 


2 


9 


19.7 


30.0 


11.5 


3 


5 


10.0 


10.0 


10.0 


4 


11 


76.4 


84.1 


67.0 


6 


11, 


, _.24.2 


29.4 


20.0 


63.6 


84.0 




7 


11 


44.1 


47.5 


40.0 


8 


6 


103.8 


105.0 


103.2 



s 

0 
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TABLE III 

ITEM DIFFICULTY ESTIMATES FOR MACHINt SHOP TEST 
(Indices Report Percent Correct) 



Piece 



Item 



Number 
Taking 



Number 
Correct 



Percent 
Correct 



2 
2 
2 
2 
2 
2 
2 
2 



1 
2 
3 

4a 

4b 
4c 
5 
6 
7 
8 
9 
10 
11 
12 
13 
1 
2 
3 
4 
5 
6 
7 
8 



37 


26 


72 


36 


24 




38 


27 


70 


37 


14 


— 38 


37 


15 


40 


37 


16 


42 


37 


25 


66 


37 


24 


(.5 


35 


16 


46 


34 


27 


80 


33 


3^' 


90 


33 


28 


84 


29 


20 


68 


24 


16 


67 


3 


3 


100 


10 


10 


100 


9 


9 


100. 

< 


5 


5 


100 


11 


9 


81 


11 


7 


63 


11 


9 

■4.1 

9 


81 


11 


'81 


6 


4 


67 



erJc 



number ot atlvanceij students taklng.tlu> test. >^tem A on Piece One appears 
to'bii the most tHrFtriil_t^ and Is perJiapa--tti»-ao&fe-deBitablg^gqi>'f rom a 
difficulty level standpoint . No incorrect responses were recorded on Item 
13 of Piece One or Items 1, 2 and 3 of Piece Two. However, only a few stu- 
dents attempted these items, and due to their nature there appears to be no 
reason for concern. 

Table IV Indicates the results of the reliability study. Due to fail" 
ure of two of the participating schools to. provide for adequate retesting,^- 
the reliability study had to be based on a smaller than desirable saiaple.' 
Two Indices of test-retest reliability were computed. The first coefficient 
represents the ptercent of agreement between the pretest and posttest perfor- 
mances on each item. The second reliability estimate is the correlation of 
pretest completion time to posttest compl^lon time for each item. A few 
items gave sufficient evidence of weak reliability to merit analysis anil 
further. study. Conceptual analysis of the four items indicating low relia- 
bility estimates failed to produce any reasons to suspect their consistency. 

Item 4 on Piece One showed low but acceptable reliability in terms of 
replicated performance success, but showed little or no consistency in terms 
of completion time. This phenomenon is perhaps due to the difficulty of the 
' itemv Items 2, 7 and 12 on Piece Two indicated low percents of agreement 
(all near AO percent levels). Items 2 and 7 also showed low ccrapletlon time 
correlations. Perhaps their order of attempt could be an explanatory factor 
further study should be conducted. The remainder of the itemf possessed ac- 
ceptable reliabilities and percents of agreement (55 percent or above) and 
in most cases also showed acceptable correlations between completion times 
(60 percent or above). The time reliability should improve with improved 



I TABLK IV 

TEST-RITiK^T RELIABILITY ESTIMATES FOR THE 
' MACHINE SHOP TEST ITEMS (N - ID* 

Percent of Correlation Between 

Piece Item Agreement** Completion Times*** 





1 


56 


• Jl 




2 


37 




1 


3 


100 






•Aa 


55 


• 08 




4b 
4c 


44 


.04 




5 


75 


. 27 




6 


42 


. 22 




8 


50 


. 71 




9 


100 






10 


67 


.73 




11 


62 " 


.65 




12 


40 


.98 




13 






2 


1 


100 


.22 


2 


2 


80 


.65 


2 


3 


100 


.36 


2 


4 


80 


.46 


2 


5 • 


67 


.69 


2 


6 


60 


.70 


2 


7 


60 


.25 


2 


8 


100 


.67 



* Not every participating student completed each item. 
** Scoring same on posttest and pretest. 

*** Time on pretest correlated to time on posttest. 

/ 
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tlniing and record ing procedures. . 

The field test generated the following comments concerning the machine 
shop test, the testing Instructions and the recording forms: 
Dlman Resional Vocatlonal-'fechnlcal High School 

1. Type of steel: Piece H is machine Steel, cold-rolled. j 

Piece #2 is Car lion Vdol steel (finished piece can be 
used) . A ' * / 

2. Any operations may be alterecl to fit achool (^^e. , on Sequence //S Diman 
Regional used 1/2-20 instead -of; 5/8-18. Also, indifferent type, ke^ay was 
cut as there were not enough %a^hlnes to cut Woodruff key) 

3. Sequences may be altered if Aachines are not available or other limita- 
tions arise. 

A. Only one piece of stock is available per student. Any.mistakfis should 
alter only one other operation at most. 

5. Checking can be done by the teacher kt the end of the operation. There 
are too many operations to check after each one (each student should 
check his own after each opeMt ion). Estlnj^ted time to check each Piece 
#1 is 5-20 minutes. If tlghf, senior^ may be used to check work. 

/ ■ ■ ' / / 

6. Tools required have not been listed so tha^ each school can better fit 
_ test to existing conditions (ex: after nuiiierlcal control^ Piece 92 may 

^ be strapped to table, put In vise, or hayi a special piece made up to 
„ ^ hold it). / . \ . 

Alternate view: Do not list tools necessary— nay cause! rigidity in 
viewing test. 

7. Straight turning needs three time areas. 
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1. Perhaps an arbor piece could be made or provided for each\8chool to cut 
flukes with dividing head. ^ 

2. Staggered start could be made on Piece i^2, sane doing numerical control. 

3. Type of metal shoulc be listed for Piece 92 (No. 11 special). 

» 

4. Make note on blueprint that center hole c^n be left in. 



/ 
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5. C(iuld put a 5/J6" ream at tapered end q(f Piece /U. 

a. Center drill 1 

b. Drill and ream . / I 

^6. Set up the dividing head only once for all students. 

• 7. Cannot do Piec^ #2 work on a milling machine (Block 2 sequences). 
8". Change sequence from 1-2-3-4-5-6-7-8 to 2-3-8-1-4-5-6-7. i 
9. "' Diameter should not be different: 4" vs. 3.75" on some pieces.! 

10. Do only eight operatibns. 

General Comments: These tests are good for the studenps as tniy uncdv- 
• ered weaknesses— i^e^ , student^ who were proficient in certaii;» areas 
were given projects or "job?" covering those skills while others were 
weakened or left undeveloped. (It was too much like industry with the 
specialization that went on in the shop.) 

Nassau County BOCES . 

1. Piece #1 test is tougher than state practical test for shop .teachers . 
which is a five and one-half hour test. Students are in machine shop 
about two hours per day all year. 

2. Piece 1^1 test is good for all grade levels (two grade levels are mixed 
in together for machine shop). 

3. Make Piece //2 square so that lathe is not required prior to njilling. 

4. Make only one undercut on Piece #1. 

5'. Make tapered arbor out of Piece #1. Standard taper could be made in- 
stead of listed taper. ; 

6. Eliminate the„3/4" thread and make the sm&ll threi^d a turn thread to 
"create a toolmaker rathet than a machinist." 

7. Many of the constructive criticisms [discussed prioi* to the field test] 
were eliminated fron^ my mind once I received the well-written instruc- 
tions on £he testing procedures. ' 

,\ ij 

8. The objectives built into t^ drawings are excellent with the possible 
exception of Piece #2 (Note #3), being actually used in the shop, due 
to safety reasons. 

9. The objectives pertaining to cutter grinding can be achieved tnrough 
the grinding of standard type cutters. 



R evision Recommendations 

' s 

the field test brought out the need for several revisions. Items IP 
and 7 on Piece One and Item 3 on Piece Two should redesigned. T>\e grad- 
Ing Itei&s should be better structured and should be included as part of the 
total teat/ Improved Instructions for timing and grading the performances 
must be written^ (perhaps by a machine shop teacher); possibly the Instruc- 
t ions should be recorded on videotape. 

Before closing this report two concerns must be discussed. The first 
is with the potentiality of creating different forms of the test by simply 
changing the metal p^dvided for the test item operations. It Is conceivable 
that parallel forms of the test' can be created by changing. stock or by chang- 
ing machinery upon which the operations are performed. The change of mater- 
ials is preferred, since the change of equipment may threaten the validity 
of the test in measuring ttie output of instruction. 

The second concern is with the centralized scoring kit. There appear 
to be two ways such a kit could be constructed, with a third option being 
that of the scoring center uSed during the field test. The first possibil- 
ity for centralized scoring is the use of plastic measures color-coded^ to 
facilitate reporting and to disguise the correct response measure. These 
plastic pieces woMld be Inversely shaped to the performance test products. 
For each test -item, several plastic measures would be included in the kit: 
a red piece at the lower tolerance threshold, blue at the upper tolerance 
level, yellow in the center, green slightly below tolerance, and white 
slightly above tolerance. The student would be instructed to-try each 
plastic piece until one fit his product; the color of the piece fitting 



would be recorded and keyed for acprlng at a central location. 

The second possibility is simila)* to the first, except that it makes 
use of the tools normally used for measurement in the shop. A set of 
calipers would be customized to measuru only the tolerances of each task* 
Each caliper would be numbered, and the student would report the number of 
the small^t caliper setting which fit r.he cut« Keyed caliper codes could 
b^^hecked at the cent^^il office to determine that the measured size was 
within tolerlrlice. Eltaer of the two scoring forms described above would 
apparently be feasible, with the plastic keys being preferred because of 
the advantage of simplicity* 

Future development of the machine shop test should investigate the 
Low reliabilities in .a controlled study ami experiment with the centralized 
sclq^ring models. However, prior to any further work a more detailed state* 
mentXof mission should be formulated so that future studies can continue 
the development of th6 tests within those frameworks In which they will be 
most often \ised. 



